embedding-based model
Application and Validation of Geospatial Foundation Model Data for the Prediction of Health Facility Programmatic Outputs -- A Case Study in Malawi
Metz, Lynn, Haggard, Rachel, Moszczynski, Michael, Asbah, Samer, Mwase, Chris, Khomani, Patricia, Smith, Tyler, Cooper, Hannah, Mwale, Annie, Muslim, Arbaaz, Prasad, Gautam, Sun, Mimi, Shekel, Tomer, Paul, Joydeep, Carter, Anna, Shetty, Shravya, Green, Dylan
The reliability of routine health data in low and middle-income countries (LMICs) is often constrained by reporting delays and incomplete coverage, necessitating the exploration of novel data sources and analytics. Geospatial Foundation Models (GeoFMs) offer a promising avenue by synthesizing diverse spatial, temporal, and behavioral data into mathematical embeddings that can be efficiently used for downstream prediction tasks. This study evaluated the predictive performance of three GeoFM embedding sources - Google Population Dynamics Foundation Model (PDFM), Google AlphaEarth (derived from satellite imagery), and mobile phone call detail records (CDR) - for modeling 15 routine health programmatic outputs in Malawi, and compared their utility to traditional geospatial interpolation methods. We used XGBoost models on data from 552 health catchment areas (January 2021-May 2023), assessing performance with R2, and using an 80/20 training and test data split with 5-fold cross-validation used in training. While predictive performance was mixed, the embedding-based approaches improved upon baseline geostatistical methods in 13 of 15 (87%) indicators tested. A Multi-GeoFM model integrating all three embedding sources produced the most robust predictions, achieving average 5-fold cross validated R2 values for indicators like population density (0.63), new HIV cases (0.57), and child vaccinations (0.47) and test set R2 of 0.64, 0.68, and 0.55, respectively. Prediction was poor for prediction targets with low primary data availability, such as TB and malnutrition cases. These results demonstrate that GeoFM embeddings imbue a modest predictive improvement for select health and demographic outcomes in an LMIC context. We conclude that the integration of multiple GeoFM sources is an efficient and valuable tool for supplementing and strengthening constrained routine health information systems.
- Africa > Malawi (0.38)
- North America > United States (0.29)
- Asia > Middle East > Oman > Muscat Governorate > Muscat (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Nevada (0.04)
- (8 more...)
Finetuning Generative Large Language Models with Discrimination Instructions for Knowledge Graph Completion
Liu, Yang, Tian, Xiaobin, Sun, Zequn, Hu, Wei
Traditional knowledge graph (KG) completion models learn embeddings to predict missing facts. Recent works attempt to complete KGs in a text-generation manner with large language models (LLMs). However, they need to ground the output of LLMs to KG entities, which inevitably brings errors. In this paper, we present a finetuning framework, DIFT, aiming to unleash the KG completion ability of LLMs and avoid grounding errors. Given an incomplete fact, DIFT employs a lightweight model to obtain candidate entities and finetunes an LLM with discrimination instructions to select the correct one from the given candidates. To improve performance while reducing instruction data, DIFT uses a truncated sampling method to select useful facts for finetuning and injects KG embeddings into the LLM. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed framework.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Europe > United Kingdom > Scotland (0.04)
- North America > United States > New York (0.04)
Combining Embedding-Based and Semantic-Based Models for Post-hoc Explanations in Recommender Systems
Le, Ngoc Luyen, Abel, Marie-Hélène, Gouspillou, Philippe
In today's data-rich environment, recommender systems play a crucial role in decision support systems. They provide to users personalized recommendations and explanations about these recommendations. Embedding-based models, despite their widespread use, often suffer from a lack of interpretability, which can undermine trust and user engagement. This paper presents an approach that combines embedding-based and semantic-based models to generate post-hoc explanations in recommender systems, leveraging ontology-based knowledge graphs to improve interpretability and explainability. By organizing data within a structured framework, ontologies enable the modeling of intricate relationships between entities, which is essential for generating explanations. By combining embedding-based and semantic based models for post-hoc explanations in recommender systems, the framework we defined aims at producing meaningful and easy-to-understand explanations, enhancing user trust and satisfaction, and potentially promoting the adoption of recommender systems across the e-commerce sector.
- Europe > France > Hauts-de-France > Oise > Compiègne (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks > Manufacturer (0.68)
Extensible Motion-based Identification of XR Users using Non-Specific Motion Data
Rack, Christian, Kobs, Konstantin, Fernando, Tamara, Hotho, Andreas, Latoschik, Marc Erich
In this paper, we combine the strengths of distance-based and classification-based approaches for the task of identifying extended reality users by their movements. For this we explore an embedding-based model that leverages deep metric learning. We train the model on a dataset of users playing the VR game ``Half-Life: Alyx'' and conduct multiple experiments and analyses using a state of the art classification-based model as baseline. The results show that the embedding-based method 1) is able to identify new users from non-specific movements using only a few minutes of enrollment data, 2) can enroll new users within seconds, while retraining the baseline approach takes almost a day, 3) is more reliable than the baseline approach when only little enrollment data is available, 4) can be used to identify new users from another dataset recorded with different VR devices. Altogether, our solution is a foundation for easily extensible XR user identification systems, applicable to a wide range of user motions. It also paves the way for production-ready models that could be used by XR practitioners without the requirements of expertise, hardware, or data for training deep learning models.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Switzerland (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Security & Privacy (1.00)
- Leisure & Entertainment > Games > Computer Games (0.66)
NeuSTIP: A Novel Neuro-Symbolic Model for Link and Time Prediction in Temporal Knowledge Graphs
Singh, Ishaan, Kaur, Navdeep, Gaur, Garima, Mausam, null
While Knowledge Graph Completion (KGC) on static facts is a matured field, Temporal Knowledge Graph Completion (TKGC), that incorporates validity time into static facts is still in its nascent stage. The KGC methods fall into multiple categories including embedding-based, rule-based, GNN-based, pretrained Language Model based approaches. However, such dimensions have not been explored in TKG. To that end, we propose a novel temporal neuro-symbolic model, NeuSTIP, that performs link prediction and time interval prediction in a TKG. NeuSTIP learns temporal rules in the presence of the Allen predicates that ensure the temporal consistency between neighboring predicates in a given rule. We further design a unique scoring function that evaluates the confidence of the candidate answers while performing link prediction and time interval prediction by utilizing the learned rules. Our empirical evaluation on two time interval based TKGC datasets suggests that our model outperforms state-of-the-art models for both link prediction and the time interval prediction task.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.90)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.83)
Unsupervised Knowledge Graph Alignment by Probabilistic Reasoning and Semantic Embedding
Qi, Zhiyuan, Zhang, Ziheng, Chen, Jiaoyan, Chen, Xi, Xiang, Yuejia, Zhang, Ningyu, Zheng, Yefeng
Knowledge Graph (KG) alignment is to discover the mappings (i.e., equivalent entities, relations, and others) between two KGs. The existing methods can be divided into the embedding-based models, and the conventional reasoning and lexical matching based systems. The former compute the similarity of entities via their cross-KG embeddings, but they usually rely on an ideal supervised learning setting for good performance and lack appropriate reasoning to avoid logically wrong mappings; while the latter address the reasoning issue but are poor at utilizing the KG graph structures and the entity contexts. In this study, we aim at combining the above two solutions and thus propose an iterative framework named PRASE which is based on probabilistic reasoning and semantic embedding. It learns the KG embeddings via entity mappings from a probabilistic reasoning system named PARIS, and feeds the resultant entity mappings and embeddings back into PARIS for augmentation. The PRASE framework is compatible with different embedding-based models, and our experiments on multiple datasets have demonstrated its state-of-the-art performance.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Asia > Macao (0.04)
- (13 more...)
GMH: A General Multi-hop Reasoning Model for KG Completion
Zhang, Yao, Zhang, Xu, Wang, Jun, Liang, Hongru, Jatowt, Adam, Lei, Wenqiang, Yang, Zhenglu
Knowledge graphs are essential for numerous downstream natural language processing applications, but are typically incomplete with many facts missing. This results in research efforts on multi-hop reasoning task, which can be formulated as a search process and current models typically perform short distance reasoning. However, the long-distance reasoning is also vital with the ability to connect the superficially unrelated entities. To the best of our knowledge, there lacks a general framework that approaches multi-hop reasoning in both short and long scenarios. We argue that there are two key issues for long distance reasoning: i) which edge to select, and ii) when to stop the search. In this work, we propose a general model which resolves the issues with three modules: 1) the local-global knowledge module to estimate the possible paths, 2) the differentiated action dropout module to explore a diverse set of paths, and 3) the adaptive stopping search module to avoid over searching. The comprehensive results on three datasets demonstrate the superiority of our model with significant improvements against baselines in both short and long distance reasoning scenarios.